智能论文笔记

Using Multi-modal Data for Improving Generalizability and Explainability of Disease Classification in Radiology

Pranav Agnihotri , Sara Ketabi , Khashayar , Namdar , Farzad Khalvati

分类：计算机视觉 | 机器学习

2022-07-29

放射学诊断的传统数据集倾向于在放射学报告旁边提供放射学图像。但是，放射科医生进行的放射学读数是一个复杂的过程，在阅读过程中，放射科医生的眼睛固定等信息有可能成为可从中学习的宝贵数据源。但是，此类数据的收集既昂贵又耗时。这导致了一个问题，即此类数据是否值得投资收集。本文利用最近发表的Eye Gaze数据集对面对不同级别的输入功能的影响的影响和解释性（DL）分类的影响进行详尽的研究，即：放射学图像，放射学报告文本和放射学家眼睛凝视数据。我们发现，通过放射学报告自由文本和放射学图像的组合，可以实现X射线图像的最佳分类性能，而眼睛凝视数据没有提供性能的提升。尽管如此，与培训的模型相比，与从事分类和注意力图的模型相比，眼睛凝视数据将作为次级基础真理以及类标签以及类似于辅助图的模型产生更好的注意力图。

translated by 谷歌翻译

Improving Disease Classification Performance and Explainability of Deep Learning Models in Radiology with Heatmap Generators

Akino Watanabe , Sara Ketabi , Khashayar , Namdar , Farzad Khalvati

分类：计算机视觉 | 机器学习

2022-06-28

由于深度学习在放射学领域被广泛使用，因此在使用模型进行诊断时，这种模型的解释性越来越成为获得临床医生的信任至关重要的。在这项研究中，使用U-NET架构进行了三个实验集，以改善分类性能，同时通过在训练过程中结合热图生成器来增强与模型相对应的热图。所有实验均使用包含胸部X光片的数据集，来自三个条件之一（“正常”，“充血性心力衰竭（CHF）”和“肺炎”）的相关标签，以及有关放射科医师眼神坐标的数值信息在图像上。引入该数据集的论文（A. Karargyris和Moradi，2021年）开发了一个U-NET模型，该模型被视为这项研究的基线模型，以显示如何将眼目光数据用于多模式培训中的眼睛凝视数据以进行多模式培训以进行多模式训练。解释性改进。为了比较分类性能，测量了接收器操作特征曲线（AUC）下面积的95％置信区间（CI）。最佳方法的AUC为0.913（CI：0.860-0.966）。最大的改进是“肺炎”和“ CHF”类别，基线模型最努力地进行分类，导致AUCS 0.859（CI：0.732-0.957）和0.962（CI：0.933-0.989）。所提出的方法的解码器还能够产生概率掩模，以突出模型分类中确定的图像部分，类似于放射科医生的眼睛凝视数据。因此，这项工作表明，将热图发生器和眼睛凝视信息纳入训练可以同时改善疾病分类，并提供可解释的视觉效果，与放射线医生在进行诊断时如何看待胸部X光片。

translated by 谷歌翻译

Large Language Models Encode Clinical Knowledge

Karan Singhal , Shekoofeh Azizi , Tao Tu , S. Sara Mahdavi , Jason Wei , Hyung Won Chung , Nathan Scales , Ajay Tanwani , Heather Cole-Lewis , Stephen Pfohl

分类：自然语言处理

2022-12-26

Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.

translated by 谷歌翻译

Design interpretable experience of dynamical feed forward machine learning model for forecasting NASDAQ

Pouriya Khalilian , Sara Azizi , Mohammad Hossein Amiri , Javad T. Firouzjaee

分类：人工智能

2022-12-22

National Association of Securities Dealers Automated Quotations(NASDAQ) is an American stock exchange based. It is one of the most valuable stock economic indices in the world and is located in New York City \cite{pagano2008quality}. The volatility of the stock market and the influence of economic indicators such as crude oil, gold, and the dollar in the stock market, and NASDAQ shares are also affected and have a volatile and chaotic nature \cite{firouzjaee2022lstm}.In this article, we have examined the effect of oil, dollar, gold, and the volatility of the stock market in the economic market, and then we have also examined the effect of these indicators on NASDAQ stocks. Then we started to analyze the impact of the feedback on the past prices of NASDAQ stocks and its impact on the current price. Using PCA and Linear Regression algorithm, we have designed an optimal dynamic learning experience for modeling these stocks. The results obtained from the quantitative analysis are consistent with the results of the qualitative analysis of economic studies, and the modeling done with the optimal dynamic experience of machine learning justifies the current price of NASDAQ shares.

translated by 谷歌翻译

Removing Objects From Neural Radiance Fields

Silvan Weder , Guillermo Garcia-Hernando , Aron Monszpart , Marc Pollefeys , Gabriel Brostow , Michael Firman , Sara Vicente

分类：计算机视觉

2022-12-22

Neural Radiance Fields (NeRFs) are emerging as a ubiquitous scene representation that allows for novel view synthesis. Increasingly, NeRFs will be shareable with other people. Before sharing a NeRF, though, it might be desirable to remove personal information or unsightly objects. Such removal is not easily achieved with the current NeRF editing frameworks. We propose a framework to remove objects from a NeRF representation created from an RGB-D sequence. Our NeRF inpainting method leverages recent work in 2D image inpainting and is guided by a user-provided mask. Our algorithm is underpinned by a confidence based view selection procedure. It chooses which of the individual 2D inpainted images to use in the creation of the NeRF, so that the resulting inpainted NeRF is 3D consistent. We show that our method for NeRF editing is effective for synthesizing plausible inpaintings in a multi-view coherent manner. We validate our approach using a new and still-challenging dataset for the task of NeRF inpainting.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Attention as a guide for Simultaneous Speech Translation

Sara Papi , Matteo Negri , Marco Turchi

分类：自然语言处理

2022-12-15

The study of the attention mechanism has sparked interest in many fields, such as language modeling and machine translation. Although its patterns have been exploited to perform different tasks, from neural network understanding to textual alignment, no previous work has analysed the encoder-decoder attention behavior in speech translation (ST) nor used it to improve ST on a specific task. In this paper, we fill this gap by proposing an attention-based policy (EDAtt) for simultaneous ST (SimulST) that is motivated by an analysis of the existing attention relations between audio input and textual output. Its goal is to leverage the encoder-decoder attention scores to guide inference in real time. Results on en->{de, es} show that the EDAtt policy achieves overall better results compared to the SimulST state of the art, especially in terms of computational-aware latency.

translated by 谷歌翻译

Transfer Learning using Spectral Convolutional Autoencoders on Semi-Regular Surface Meshes

Sara Hahner , Felix Kerkhoff , Jochen Garcke

分类：计算机视觉

2022-12-12

The underlying dynamics and patterns of 3D surface meshes deforming over time can be discovered by unsupervised learning, especially autoencoders, which calculate low-dimensional embeddings of the surfaces. To study the deformation patterns of unseen shapes by transfer learning, we want to train an autoencoder that can analyze new surface meshes without training a new network. Here, most state-of-the-art autoencoders cannot handle meshes of different connectivity and therefore have limited to no generalization capacities to new meshes. Also, reconstruction errors strongly increase in comparison to the errors for the training shapes. To address this, we propose a novel spectral CoSMA (Convolutional Semi-Regular Mesh Autoencoder) network. This patch-based approach is combined with a surface-aware training. It reconstructs surfaces not presented during training and generalizes the deformation behavior of the surfaces' patches. The novel approach reconstructs unseen meshes from different datasets in superior quality compared to state-of-the-art autoencoders that have been trained on these shapes. Our transfer learning errors on unseen shapes are 40% lower than those from models learned directly on the data. Furthermore, baseline autoencoders detect deformation patterns of unseen mesh sequences only for the whole shape. In contrast, due to the employed regional patches and stable reconstruction quality, we can localize where on the surfaces these deformation patterns manifest.

translated by 谷歌翻译

Testing GLOM's ability to infer wholes from ambiguous parts

Laura Culp , Sara Sabour , Geoffrey E. Hinton

分类：计算机视觉 | 机器学习

2022-11-29

The GLOM architecture proposed by Hinton [2021] is a recurrent neural network for parsing an image into a hierarchy of wholes and parts. When a part is ambiguous, GLOM assumes that the ambiguity can be resolved by allowing the part to make multi-modal predictions for the pose and identity of the whole to which it belongs and then using attention to similar predictions coming from other possibly ambiguous parts to settle on a common mode that is predicted by several different parts. In this study, we describe a highly simplified version of GLOM that allows us to assess the effectiveness of this way of dealing with ambiguity. Our results show that, with supervised training, GLOM is able to successfully form islands of very similar embedding vectors for all of the locations occupied by the same object and it is also robust to strong noise injections in the input and to out-of-distribution input transformations.

translated by 谷歌翻译

Text Representation Enrichment Utilizing Graph based Approaches: Stock Market Technical Analysis Case Study

Sara Salamat , Nima Tavassoli , Behnam Sabeti , Reza Fahmi

分类：机器学习

2022-11-29

Graph neural networks (GNNs) have been utilized for various natural language processing (NLP) tasks lately. The ability to encode corpus-wide features in graph representation made GNN models popular in various tasks such as document classification. One major shortcoming of such models is that they mainly work on homogeneous graphs, while representing text datasets as graphs requires several node types which leads to a heterogeneous schema. In this paper, we propose a transductive hybrid approach composed of an unsupervised node representation learning model followed by a node classification/edge prediction model. The proposed model is capable of processing heterogeneous graphs to produce unified node embeddings which are then utilized for node classification or link prediction as the downstream task. The proposed model is developed to classify stock market technical analysis reports, which to our knowledge is the first work in this domain. Experiments, which are carried away using a constructed dataset, demonstrate the ability of the model in embedding extraction and the downstream tasks.

translated by 谷歌翻译